Substitution Matrices of Residue Triplets Derived from Protein Blocks
نویسندگان
چکیده
In protein sequence alignment, residue similarity is usually evaluated by substitution matrix, which scores all possible exchanges of one amino acid with another. Several matrices are widely used in sequence alignment, including PAM matrices derived from homologous sequence and BLOSUM matrices derived from aligned segments of BLOCKS. However, most matrices have not addressed the high-order residue-residue interactions that are vital to the bio-properties of protein. With consideration for the inherent correlation in residue triplet, we present a new scoring scheme for sequence alignment. Protein sequence is treated as overlapping and successive 3-residue segments. Two edge residues of a triplet are clustered into hydrophobic or polar categories, respectively. Protein sequence is then rewritten into triplet sequence with 2 x 20 x 2 = 80 alphabets. Using a traditional approach, we construct a new scoring scheme named TLESUM(hp) (TripLEt SUbstitution Matrices with hydrophobic and polar information) for pairwise substitution of triplets, which characterizes the similarity of residue triplets. The applications of this matrix led to marked improvements in multiple sequence alignment and in searching structurally alike residue segments. The reason for the occurrence of the "twilight zone," i.e., structure explosion of low identity sequences, is also discussed.
منابع مشابه
Inconsistent Distances in Substitution Matrices can be Avoided by Properly Handling Hydrophobic Residues
The adequacy of substitution matrices to model evolutionary relationships between amino acid sequences can be numerically evaluated by checking the mathematical property of triangle inequality for all triplets of residues. By converting substitution scores into distances, one can verify that a direct path between two amino acids is shorter than a path passing through a third amino acid in the a...
متن کاملAmino acid substitution matrices for protein conformation identification
Methods for alignment of protein sequences typically measure similarity by using substitution matrix with scores for all possible exchanges of one amino acid with another. Although widely used, the matrices derived from homologous sequence segments, such as Dayhoff’s PAM matrices and Henikoff’s BLOSUM matrices, are not specific for protein conformation identification. Using a different approach...
متن کاملA Transition Probability Model for Amino Acid Substitutions from Blocks
Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for pr...
متن کاملAmino acid substitution matrices from protein blocks ( amino add sequence / alignment algorithms / data base srching )
Methods for alignment of protein sequences typically measure similait by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more tha...
متن کاملAmino acid substitution matrices from protein blocks.
Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 17 12 شماره
صفحات -
تاریخ انتشار 2010